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Descripti n 



Application Field 



The present invention relates to a novel cDNA originating from an mRNA expressed in human cells and a galectin- 
4-lrke protein encoded by this cDNA. The human cDNA of the present invention can be used as a probe for the gene 
diagnosis and a gene source for the gene therapy. Furthermore, the cDNA can be used as a gene source for large-scale 
production of the protein encoded by said cDNA. The protein of the present invention can be used as pharmaceuticals 
or reagents for glycogenic researches. 



Prior Art 



Galectins are the general term for animal lectins binding to galactose. Animal lectins exist in many sites such as the 
cytoplasm, the nucleus, the cell membrane surface, etc., and considered to be related with the cell proliferation, the dif- 
ferentiation, the canceration, the metastasis, the immunity, and so on [Drickamer, K., Annu. Rev. Cell. Biol.. 9: 237-264 
(1 993)]. Among them, galectin-4 has been found as a lectin contained abundantly in the rat intestinal extract The galec- 
tin-4 is expressed specifically in the digestive tracts such as the stomach and intestines, and is abundant in the mucous 
membrane, thereby being a putative protein essential for maintaining the functions of these organs. Although a rat 
cDNA encoding the galectin-4 has been cloned up to date [Oda, Y. et al., J. Biol. Chem., 268: 5929-5939 (1993)1 any 
report has not been presented on a cDNA encoding a human galectin-4-like protein. 

Disclosure o f the Invention 



As the result of intensive studies, the present inventors were successful in cloning of a human cDNA encoding a 
galectin-4-like protein, thereby completing the present invention. That is to say, the present invention provides a protein 
containing the amino acid sequence represented by Sequence No. 1 that is a human galectin-4-like protein The 
present invention, also, provides a DNA encoding said protein exemplified as a cDNA containing the base sequence 
represented by Sequence No. 2 or No. 3. 

The human cDNA of the present invention can be cloned from a cDNA library of the human cell origin This cDNA 
library is constructed using as a template a poly(A) + RNA extracted from human cells. The human cells may be cells 
delivered from the human body, for example, by the operation or may be the culture cells. A poly(A) + RNA isolated from 
the stomach cancer tissue is used in Examples. The cDNA can be synthesized by using any method selected from the 
Okayama-Berg method [Okayama, H. and Berg, P., Mol. Cell. Biol., 2: 161-170 (1982)], the Gubler-Hoffman method 
[GuWer, U. and Hoffman, J. Gene, 25: 263-269 (1983)], and so on, but it is preferred to use the capping method [Kato 
S. et al.. Gene, 150: 243-250 (1994)] as illustrated in Examples in order to obtain a full-length clone in an effective man- 
ner. The identification of the cDNA is performed by the determination of the whole base sequence by the sequencing 
the search of the amino acid sequence predicted from the base sequence and the known protein having a similar 
sequence, the protein expression by the in vitro translation, the expression by Escherichia cofi t and the activity meas- 
urement of the expressed product. The activity measurement is carried out by identifying the binding with lactose. 

The cDNA of the present invention is characterized by containing the base sequence represented by Sequence No 
1 , as exemplified by that represented by Sequence No. 2. For example, that represented by Sequence No. 3 possesses 
a 1 1 1 3-bp base sequence with a 972-bp open reading frame. This open reading frame codes for a protein consisting of 
323 amino acid residues. This protein possesses such a high 76.3% similarity to the rat galectin-4 in the amino acid 
sequence level. 

Hereupon, the same clone as the cDNA of the present invention can be easily obtained by screening of the human 
cDNA library constructed from the gastrointestinal tissues or the gastrointestinal cell lines, by the use of an oligonucle- 
otide probe synthesized on the basis of the cDNA base sequence depicted in Sequence No. 3. 

In general, the polymorphism due to the individual difference is frequently observed in human genes. Therefore 
any cDNA that is subjected to insertion or deletion of one or plural nucleotides and/or substitution with other nucleotides 
in the base sequence encoding the amino acid sequence represented by Sequence No. 1 or in Sequence No. 3 shall 
come within the scope of the present invention. 

In a similar manner, any protein that is produced by these modifications comprising insertion or deletion of one or 
plural nucleotides and/or substitution with other nucleotides shall come within the scope of the present invention. 

The cDNA of the present invention includes cDNA fragments (more than 10 bp) containing any partial base 
sequence of the base sequence represented by Sequence No. 2 or No. 3. Also, DNA fragments consisting of a sense 
chain and an anti-sense chain shall come within this scope. These DNA fragments can be used as the probes for the 
gene diagnosis. 

The protein of the present invention can be expressed in vitro by preparation of an RNA by the in vitro transcription 
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from a vector having a cDNA of the present invention, followed by the in vitro translation using this RNA as a template. 
Also, the recombination of the translation domain to a suitable expression vector by the method known in the art leads 
to the expression of a large amount of the encoded protein by using Escherichia col Bacillus subtilis, yeasts animal 
cells, and so on. Alternatively, the peptide can be prepared by the chemical synthesis on the basis of the amino acid 
5 sequence shown in the sequence table shown below. 

Any fusion protein with another optional protein is included in the scope of the present invention, as far as it pos- 
sesses lactose-binding activity. Such examples include a fusion protein with the maltose-binding protein illustrated in 
Examples. 



10 Brief Description of the Drawing s 

Figure 1 is a figure depicting the structure of the plasmid pHP01049 of the present invention. 
Figure 2 is a figure depicting the structure of the Escherichia coli expression vector pMKGAL4 of the present inven- 
tion. 

75 

Best Mode for Carrying Out the Inventinn 



The present invention is embodied in more detail by the following examples, but this embodiment is not intended to 
restrict the present invention. The basic operations and the enzyme reactions with regard to the DNA recombination are 
earned out according to the literature [Molecular Cloning. A Laboratory Manual", Cold Spring Harbor Laboratory, 1 989]. 
Unless otherwise stated, restrictive enzymes and a variety of modification enzymes to be used were those available 
from TAKARA SHUZO. The manufacturer's instructions were used for the buffer compositions as well as for the reaction 
conditions, in each of the enzyme reactions. The cDNA synthesis was carried out according to the literature TKato S 
etal., Gene, 150:243-250(1994)]. ' ' 

Examples 



Preparation of Pnl Y (A)± Rfvfft 



After 1 g of a human stomach cancer tissue was homogenized in 20 ml of a 5.5 M guanidinium thiocyanate solution 
750 »g of mRNA was prepared according to the literature [Okayama, H. et al., "Methods in Enzymology" Vol. 1 64 Aca- 
demic Press, 1987]. This was subjected to oligo(dT)-cellulose column chromatography washed with a 20 mM Tris- 
hydrochloric acid buffer solution (pH 7.6), 0.5 M NaCI. and 1 mM EDTA to obtain 10 ^ig of a poly(A) + RNA according to 
the literature mentioned above. 

Construction of cDNA Library 



Ten micrograms of the above described poly(A) + RNA were dissolved in a 100 mM Tris-hydrochloric acid buffer 
solution (pH 8), one unit of an RNase-free, bacterial alkaline phosphatase was added, and the reaction was run at 37°C 

40 for one hour. After the reaction mixture was subjected to phenol extraction followed by ethanol precipitation, the pellet 
was dissolved in a solution containing 50 mM sodium acetate (pH 6), 1 mM EDTA, 0.1% 2-mercaptoethanol, and 0.01% 
Triton X-1 00. Thereto was added one unit of a tobacco acid pyrophosphatase (Epicentre Technologies) and a total 100 
uJ volume of the resulting mixture was reacted at 37°C for one hour. After the reaction mixture was subjected to phenol 

45 RNA^ 00 foll0Wed by Gthano1 preci P itation « tne P ellet was dissolved in water to obtain a solution of a decapped poly(A) + 

The decapped poly(A) + RNA and 3 nmol of a chimeric DNA-RNA oligonucleotide (5*<IG-dG-dG-dG<iA-dA-dT-dT- 
dC-dG-dA-G-G-A-3*) were dissolved in a solution containing 50 mM Tris-hydrochloric acid buffer (pH 7.5), 0 5 mM ATP, 
5 mM MgCI 2 . 10 mM 2-mercaptoethanol, and 25% polyethylene glycol, whereto was added 50 units of T4RNA ligase 
and a total 30 nl volume of the resulting mixture was reacted at 20°C for 12 hours. After the reaction mixture was sub- 
jected to phenol extraction followed by ethanol precipitation, the pellet was dissolved in water to obtain a chimeric-oliao- 
capped poly(A) + RNA. y 

After digestion of a vector pKA1 (Japanese Patent Kokai Publication No. 1 992-1 1 7292) developed by the present 
inventors with Kpnl. about 60 dT tails were added using a terminal transferase. A vector primer to be used below was 
prepared by digestion of this addition product with EcoRV to remove a dT tail at one side. 

After 6 ng of the previously-prepared chimeric-oligo-capped poly(A) + RNA was annealed with 1 .2 M g of the vector 
primer, the resulting mixture was dissolved in a solution containing 50 mM Tris-hydrochloric acid buffer (pH 8.3) 75 mM 
KCI. 3 mM MgCI 2 , 10 mM dithiothreitol, and 1 .25 mM dNTP (dATP + dCTP + dGTP + dTTP), 200 units of a transcriptase 
(GIBCO-BRL) were added, and the reaction in a total 20 |xl volume was run at 42°C for one hour. After the reaction mix- 
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ture was subjected to phenol extraction followed by ethanol precipitation, the pellet was dissolved in a solution contain- 
ing 1 50 mM Tris-hydrochloric acid buffer (pH 7.5), 100 mM NaCI, 10 mM MgCI 2 , and 1 mM dithiothreitol. Thereto were 
added 100 units of EcoRI and a total 20 nl volume of the resulting mixture was reacted at 37°C for one hour After the 
reaction mixture was subjected to phenol extraction followed by ethanol precipitation, the pellet was dissolved in a solu- 
tion containing 20 mM Tris-hydrochloric acid buffer (pH 7.5), 100 mM KCI. 4 mM MgCI 2 . 10 mM (NH 4 ) 2 S0 4 and 50 
fig/ml of the bovine serum albumin. Thereto were added 60 units of an Escherichia coli DNA ligase and the resultjnq 
mixture was reacted at 16°C for 16 hours. To the reaction mixture were added 2 ul of 2 mM dNTP 4 units of an 
Eschenctva coli DM polymerase I. and 0.1 unit of an Escherichia coli DNase H and the resulting mixture was reacted 
at 1 2°C for one hour and then at 22°C for one hour. 

_ Next, the cDNA-synthesis reaction solution was used for transformation of an Escherichia coli DH12S (GIBCO- 
*T w transformatlon was carried out bv 30 electroporation method. A portion of the transformant was sprayed on 
the 2xYT agar culture medium containing 100 ug/ml ampicillin and the mixture was incubated at 37°C overnight A col- 
ony formed on the agar medium was picked up at random and inoculated on 2 ml of the 2xYT culture medium contain- 
ing 100 ng/ml ampicillin. After incubation at 37°C for 2 hours, the mixture was infected with a helper phage MK13K07 
(Pharmacia) and incubated further at 37°C overnight. The culture solution was centrifuged to separate the mycelia and 
the supernatant, wherein a double-stranded DNA was isolated from the mycelia by the alkaline hydrolysis method and 
a single-stranded plasmid DNA from the supernatant according to the conventional method. After double digestion with 
EcoRI and NotUhe double-stranded plasmid DNA was subjected to 0.8% agarose gel electrophoresis to determine the 
size of the cDNA insert. On the other hand, after the sequence reaction using an M13 universal primer labeled with a 
fluorescent dye and a Taq polymerase (a kit of Applied Biosystems). the single-stranded phage DNA was examined 
with a fluorescent DNA sequencer (Applied Biosystems) to determine the about 400 bp base sequence at the 5'-termi- 
nus of the cDNA. The sequence data were filed as the Homo • Protein cDNA Bank database. 

CDNA Cloning 

The base sequencing of the clones selected at random from the above-mentioned cDNA library was carried out 
and the obtained base sequence was converted to three frames of the amino acid sequence, which were subjected to 
a search of the protein data base. The analysis software used was GENETYX-MAC (Software Development) As the 
result, a protein encoded by the clone HP01049 was revealed to have the similarity to the rat galectin-4 in the amino 
acid sequence level. The structure of this plasmid is depicted in Figure 1. The structure consisting of a 56-bp 5'-non- 
translation region, a 972-bp open reading frame, an 85-bp ^-nontranslation region, and a 37-bp poly(A) tail (Sequence 
Na 3) was found from the determination of the whole base sequence of the cDNA insert. The open reading frame 
codes for a protein consisting of 323 amino acid residues and the search of the protein data base using this sequence 
revealed such a high 76.3% similarity to the rat galectjn-4 amino acid sequence over the whole regions Table 1 shows 
the comparison between the amino acid sequence of the human galectin-4-like protein of the present invention (HS) 
and that of the rat galectin-4 (RN). Therein, the marks of - (minus). * (asterisk) and. (dot) represent a gap. an amino acid 
residue identical with the protein of the present invention, and an amino acid residue similar to the protein of the present 
invention, respectively. 



45 



50 



55 



4 



EP0 841 393 A1 



Table 1 

HS MAYVPAPGYQPTyNPTLPYYQP I PGGLNVGMSVY I QC VASEHMKRFFVNFWGQDPGSD V 
******************* . ******. ****. ****. *...*.** ***. * m * t 

10 ' " 

RN MAYVPAPGYQPTY.NTTLPYKRP I PGGLSVGMS IYIQGI AKDNMRRFHVNFA VGQDEC AD I 
HS AFHFNTIU^DGIM VVFNTLQGGKSGSEERKRSMPFffiGAAFELVF I \iAEHYKWVNGN? 

15 ******************.*.*.**.**.*.****.** *****.*.. *********.* 

RN AFHFNPRFIODKVVFNTMQS^ 

HS FYEYGHRLPLQMVTHLQVDGDLQLQS1NFIGGQPLRPQ- -GPPMMPPYPGPGHCHQQLNS 
20 *******************^ * * .*.**..*. ..*.** 

W-FYEYGHWPLQMVTHl^^ 

HS LPTMEGPPTFN?P\TYFGRLQGGLTARRTI 1 1 KGYVPPTGKSFA I NFKVGSSGD I ALH I \ 

?5 

**. *. ***. ******* * ***************** **. *. . ******* ****. *. * 
RN LPVMAGPPI FNPPVPYVGTLQGGLTARRT1 1 IKGYVLPTAKNL1 INFKVGSTGD1 AFHMN 
, 0 HS PRMGN(HWRNSI1NGSVGSEEKKITHN^ 

**. *. ***** . ********. **. . ****. ************ *****. ********* 

RN PR I GD-CWRNSYMNGSIGSEERK I PYNPFGAGQFFDLS I RCGTDRFKVFANGQHLFDFS 
s HS HRLSAFQRVDTLE IQGDVTLSYVQI 

**. ****** ***. **. ******* 

RN HRFQAFQRVDMLE IKGDITLSYVQI 



Hereupon, the search of the base sequence databases GenBank™/EMBL/DDBJ using the obtained-cDNA 
sequence revealed that the EST database has registered a cDNA partial sequence (Accession Mo. D25577) partially 
consistent with the S'-nontranslation region (No. 1022 to No. 1 1 13) in the cDNA of the present invention represented by 
Sequence No. 3. Nevertheless, the consistency of a partial sequence does not assure that said fragment and the full- 
length cDNA of the present invention originate from the same mRNA. Furthermore, only this sequence does not indi- 
cate the amino acid sequence as well as the function of the putatively encoded protein. 

Protein synthesis bv In Vitro Translation 



The vector pHP01049 having the cDNA of the present invention was used for in vitro translation with a T N T rabbit 
reticulocyte lysate kit (Promega). In this case, pSJmethionine was added to label the expression product with a radio- 
isotope. Each of the reactions was carried out according to the protocols attached to the kit. Two micrograms of the 
plasm.d pHP01049 was reacted at 30°C for 90 minutes in a total 1 00 ul volume of the reaction mixture containing 50 nl 
of the T N T rabbit reticulocyte lysate, 4 ul of a buffer solution (attached to the kit), 2 ul of an amino acid mixture (Met- 
free). 8 ul f [ SJmethionine (Amersham) (0.37 Mbo/nl), 2 pi of T7RNA polymerase, and 80 U of RNasin. To 3 ul of the 
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resulting reaction mixture was added 2 pi of the SDS sampling buffer (125 mM Tris-hydrochloric acid buffer pH 6 8 120 
mM 2-mercaptoethanol. 2% SDS solution, 0.025% bromophenol blue, and 20% glycerol) and the resultjng'mixture was 
heated at 95°C for 3 minutes and then subjected t SDS-polyacrylamide gel electrophoresis. Determination of the 
molecular weight of the translation product by carrying out the autoradiography indicated that the cDNA of the present 
invention yi Ided the translation product with the molecular mass of about 36 kDa. This value is consistent with the 
molecular weight of 35940 predicted for the putative protein from the base sequence represented by Sequence No 3 
thereby indicating that the cDNA certainly codes for the protein represented by Sequence No. 3. 

Measurement of Lactose -Bindino Activity of In-Vitro Translation Produnt 

After 100 ml of a Sepharose-4B gel suspension (Pharmacia) was washed well with 0.5 M sodium carbonate the 
gel was suspended in 100 ml of 0.5 M sodium carbonate. Thereto was added 1 0 ml of a vinyl suffone and the resulting 
mixture was gently stirred at room temperature for one hour. After washing with 0.5 M sodium carbonate the gel was 
suspended in a solution of 10% lactose and 0.5 M sodium carbonate, and the suspension was stirred gently overnight 
at room temperature. The resulting gel was washed in order with 0.5 M sodium carbonate, water, and 0.05 M phosphate 
buffer (pH 7.0). The thus-obtained lactosyl- Sepharose-4B gel was stored at 4°C in the 0.05 M phosphate buffer foH 
7.0) containing 0.02% sodium azide. 

By chromatography of 100 uJ of the in-vitro translation solution on Sephadex G-75, the unreacted [ 35 S]methionine 
was removed and the fractions containing the 36-kDa translation product were collected. These fractions were charged 
in the previously-prepared. lactosyl-Sepharose-4B column (head volume: 4.5 ml), which was washed with 20 ml of a 
column buffer for the lactose column (20 mM Tris-hydrochloric acid buffer, pH 7.5. 2 mM EDTA. 150 mM NaCI 4 mM 2- 
mercaptoethanol. and 0.01% Triton X-100) and eluted with 20 ml of the column buffer containing 0.3 M lactose As the 
result, the observation for the eluates to contain the 36-kDa translation product indicated that the protein of the present 
invention possessed the lactose-binding activity. 

Expression of Fusion Protein bv Escherichia coli 

After the digestion of 1 (ig of the plasmid pHP01 049 with 20 units of Notl. blunting was performed by treatment with 
the Klenow enzyme. Then, after the digestion with Pstl. followed by 1% agarose gel electrophoresis, a 1 .2-kbp fragment 
was cut off from the gel. 

Next, after 1 M g of pMAL™-c2 (New England Biolabs) was digested with 20 units of Hindlll, blunting was performed 
by treatment with the Klenow enzyme. Then, after the digestion with Pstl. followed by 1% agarose gel electrophoresis 
a 6.7-kbp DNA fragment was cut off from the gel. The vector fragment and the cDNA fragment were ligated by using a 
ligation kit and then Escherichia coli JM1 09 was transformed. Plasmid pMALGAL4 was prepared from the transformant 
and the objective recombinant was identified by the restriction enzyme cleavage map. 

A suspension of 10 ml of an overnight-incubated liquid of pMALGAL4/JM109 in 500 ml of the Rich culture medium 
(contains 10 g of triptone. 5 g of yeast extract. 5 g of NaCI, and 2 g of glucose per one liter) was incubated in a shaker 
at 37°C and isopropylthiogalactoside was added so as to make 1 mM when A^q reached about 0.5. After further incu- 
bation at 37»C for 3 hours, the mycelia collected by centrifugation were suspended in 25 ml of a column buffer for amy- 
lose column (10 mM Tris-hydrochloric acid. pH 7.4. 200 mM NaCI. and 1 mM EDTA). After sonication. the suspension 
was centnfuged and the supernatant was charged into an amylose column (New England Biolabs) with a 3 5-ml head 
volume. After the column was washed with an 8-fold column volume of the column buffer, a maltose-binding pro- 
tein/galectin-4-like protein fusion protein was eluted with 20 ml of the column buffer containing 1 0 mM maltose to afford 
1 0.9 mg of the fusion protein. The SDSiXjIyacrylamide electrophoresis of this fusion protein indicated a single band at 
the position of about 81 kDa. This molecular mass value is consistent with the molecular weight predicted for the mal- 
tose-binding prota'n/galectin-4-like protein fusion protein. 

Measurement of Lactose-B indino Activity of Fusion Protein 

The above-prepared fusion protein was charged in the previously-prepared. lactosyl-Sepharose-4B column (head 
volume: 4.5 ml), which was washed with 20 ml of a column buffer for the lactose column and eluted with 20 ml of the 
column buffer containing 0.3 M lactose. The SDS-polyacrylamide electrophoresis of the eluted protein recognized a sin- 
gle band at a 81 -kDa position, indicating that the maltose-binding protein/galectin-4 fusion protein obtained by the 
Escherichia coli expression possessed the lactose-binding activity. 

Expression of Galectin-4-Like Protein by Escherichia coli 

After the digestion of 1 M of the plasmid pHP01049 with 20 units of Notl. blunting was performed by treatment with 
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the Klenow enzyme. Then, after the digestion with Aatll. followed by 0.8%% agarose gel electrophoresis, an about 1- 
top fragment was cut off from the gel. Next, aft r 1 ug of the Escherichia coli expression vector pMPRA3 (Japanese 
Patent Kokai Publication No. 1990-182186) having tac promoter, a metapyrocatechase SD sequence, and rrnBT1T2 

™?f! di98Sted With 20 UnitS ° f Aat " and with Smal " followed 0 8 % ^arose gel electrophoresis, an about 
2.8-kbp DNA fragment was cut off from the gel. Both cDNA fragments were ligated by using a ligation kit and then 
Eschenchia coli JM1 09 was transformed. Plasmid pMAKGAU-Astll was prepared from the transformant and the objec- 
tive recombinant was identified by the restriction enzyme cleavage map. 

~ S,randS ° f a " oli 9° nucleoti de primer PR1 (5-GGGACGTCATGGCCTATGTCCCCGCACC-3') and PR2 IS- 
GGCGACGTCTGAGCCCGGATCCTGCCC-3') were synthesized using a DNA synthesizer (Applied Biosystems) 
according to the attached protocol. The 5'-translaticn region was amplified by the PCR kit (TAKARA SHUZO) usinq 1 
ng of plasmid ' pHP01049 as well as 100 pmole each of primers PR1 and PR2. After the phenol extraction and ethanol 
extraction, followed by the digestion with 20 units of Astll (TOYOBO). the reaction product was subjected to 1 5% aoa- 
rose electrophoresis, cutting off of an about 190-bp DNA fragment, and purification. 

Afterl u g of plasmid pMAKGAL4-Aatll was digested with 20 units of Aatll. a 3.8-kbp DNA fragment was cut off from 
Vie gel. This DNA fragment and the about 190-bp DNA fragment previously prepared by PCR were ligated by using a 
ligation kit and then Escherichia coli JM109 was transformed. Plasmid pMALGAL4 was prepared from the transformant 
and the objective recombinant was identified by the restriction enzyme cleavage map. Figure 2 depicts the structure of 
the obtained plasmid. 

A suspension of 10 ml of an overnight-incubated liquid of pMALGAL4/JM109 in 100 ml of the LB culture medium 
containing 1 100 ng/ml of ampicillin was incubated in a shaker at 37»C and isopropylthiogalactoside was added so as to 
make 1 mM when A^ reached about 0.5. After further incubation at 37°C for 3 hours, the mycelia collected by centrif- 
ugation were suspended in 25 ml of the column buffer for lactose column. After sonication. the suspension was centri- 
fuged andthe supernatant was charged into the previously prepared. lactosyl-Sepharose-4B column (a 4 5-ml head 
vo ume). The column was washed with 20 ml of the column buffer for lactose column and then eluted with 20 ml of the 
column buffer containing 0.3 M lactose. The SDS-polyacrylamide electrophoresis of this eluted protein indicated a sin- 
gle band at the position of 36 kDa. This molecular mass value is consistent with the molecular weight predicted for the 
human galectin-4-like protein. That is to say. the human galectin-4-like protein expressed by Escherichia coli was indi- 
cated to possess the lactose-binding activity. 

30 Probable Industrial App licability 

The present invention provides a human cDNA encoding a galectin-4-like protein and a protein encoded by this 
human cDNA. Said recombinant protein can be expressed in a large amount by using the cDNA of the present inven- 
tion. Said recombinant protein can be used as pharmaceuticals, particularly for the treatment of the digestive tract dis- 
ss eases, or as research reagents, particulary as the reagents for the glycogenic research 
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Sequence No . : 1 
Sequence length: 323 
Sequence type: Amino acid 
Sequence kind: Protein 
Sequence description 

Met Ala Tyr Val Pro Ala Pro Cly Tyr Gin Pro Thr Tyr Asn Pro Ihr 

1 5 10 15 

Leu Pro Tyr Tyr Gin Pro He Pro Gly Gly Leu Asn Val Gly Met Ser 

20 25 30 

Val Tyr lie Gin Gly Val Ala Ser Glu His Met Lys Arg Phe Phe Val 

35 40 45 

Asn Phe Val Val Gly Gin Asp Pro Gly Ser Asp Val Ala Phe His Phe 

50 55 60 

Asn Pro Arg Phe Asp Gly Trp Asp Lys Val Val Phe Asn Thr Leu Gin 
65 70 75 80 
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Gly Cly Lys Trp Cly Ser Glu Glu Arg Lys Arg Ser Met Pro Phe Lys 

85 90 95 

Lys Gly Ala Ala Phe Glu Leu Val Phe He Val Leu Ala Glu His Tyr 

100 105 110 

Lys-Val Val Val Asn Gly Asn Pro Phe Tyr Glu Tyr Gly His Arg Leu 

115 120 125 

Pro Leu Gin Met Val Thr His Leu Gin Val Asp Gly Asp Leu Gin Leu 

130 135 140 

Gin Ser He Asn Phe He Gly Gly Gin Pro Leu Arg Pro Gin Gly Pro 
145 150 155 160 

Pro Met Met Pro Pro Tyr Pro Gly Pro Gly His Cys His Gin Gin Leu 

165 170 175 

Asn Ser Leu Pro Thr Met Glu Gly Pro Pro Thr Phe Asn Pro Pro Val 

180 185 190 

Pro Tyr Phe Gly Arg Leu Gin Gly Gly Leu Thr Ala .Arg Arg Thr He 

195 200 205 

He He Lys Gly Tyr Val Pro Pro Thr Gly Lys Ser Phe Ala He Asn 

210 215 220 

Phe Lys Val Gly Ser Ser Gly Asp He Ala Leu His lie Asn Pro Arg 
225 230 235 240 

Met Gly Asn Gly Thr Val Val Arg Asn Ser Leu Leu Asn Gly Ser Trp 

245 250 255 

Gly Ser Glu Glu Lys Lys He Thr His Asn Pro Phe Gly Pro Gly Gin 

260 265 270 

Phe Phe Asp Leu Ser He Arg Cys Gly Leu Asp Arg Phe Lys Val Tyr 

275 280 285 

Ala Asn Gly Gin His Leu Phe Asp Phe Ala His Arg Leu Ser Ala Phe 

290 295 300 

Gin Arg Val Asp Thr Leu Glu He Gin Gly Asp Val Thr Leu Ser Tyr 
305 310 315 320 

Val Gin He 
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Sequence No . : 2 

Sequence length: 969 

Sequence type: Nucleic acid 

Strandedness : Double 

Topology: Linear 

Sequence kind: cDNA to mRNA 

Origin: 

Sequence description 

ATGGOCTATG TOCOCGCACC GCGCTAOCAG COCACCTACA ACOCGAOGCT GOCTTACTAC 60 

CAGOOCATOC CGGGCGGGCT CAACGTGGGA ATGTCTGTTT ACATOCAAGG AGTGGOCAGC 120 

GAGCACATGA A(£GGTTCTT CGTGAACTTT CTGGTTGGGC AGGATCOGGG CTCAGACGTC 180 

GCCTTCCACT TCAATCCCCG GTTTGACGGC TGGGACAAGG TGGTCTTCAA CACGTTGCAG 240 

GGCGGGAAGT GGGGCAGCGA GGAGAGGAAG AGGAGCATGC OCTTCAAAAA GGGTGCCGOC 300 

TTTGAGCTGG TCTTCATAGT OCTGGCTGAC CACTACAAGG TGGTGGTAAA TGGAAATOCC 360 

TTCTATGAGT ACGGGCAOCG GCTTOCOCTA CAGATGGTCA COCACCTGCA AGTGGATGGG 420 

GATCTGCAAC TTCAATCAAT CAACTTCATC GGAGGCCAGC CJCCTOCGGCC CCAGGGACCC 480 

CCCATGATGC CACCTTACCC TGGTOOCGGA CATTGOCATC AACAGCTGAA CAGOCTGCOC 540 

ACCATGGAAG GAOC00CAAC CTTCAAOCCG CCTGTGCCAT ATTTCGGGAG GCTGCAAGGA 600 

GGGCTCACAG CTCGAAGAAC CATCATCATC AAGGGCTATG TGOCTCCCAC AGGCAAGAGC 660 

TTTGCTATCA ACTTCAAGGT GWOCCTCA GGGGACATAG CTCTGCACAT TAATCCCCGC 720 

ATGGGCAACG GTAOCGTGGT CCGGAACAGC CTTCTGAATG GCTCCTGGGG ATOCGAGGAG 780 

AAGAAGATCA COCACAAOCC ATTTGGTCCC GGACAGTTCT TTGATCTGTC CATTCGCTGT 840 

GGCTTGGATC GCTTCAAGGT TIACGCCAAT CGCCAGCAOC TCTTTGACTT TGCCCATCGC 900 

CTCTCGGCCT TOCAGAGGGT GGACACATTG GAAATOCAGC GTGATGTCAC CTTGTCCTAT 960 

GTOCAGATC 969 
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Sequence No. : 3 
Sequence length: 1113 
Sequence type: Nucleic acid 
Strandedness : Double 
Topology: Linear 
Sequence kind: cDNA to mRNA 
Origin : 

Animal name: Homo sapiens 

Cell kind: Stomach cancer tissue 

Clone name: HP0104 9 
Sequence characteristics : 

Characterization code: CDS 

Existence position: 57.. 1029 

Characterization method: E 

Sequence description 

ATCTCCCACT CCTCCAGCTC TTCTCACACC ACCACCCACT ACCGCAGCCT CGAGCC ATC 

Met 
1 

GOC TAT CTC COC OCA CCC GCC TAC CAG OCC ACC TAC AAC CCC ACC CTC 
Ala Tyr Val Pro Ala Pro Gly Tyr Gin Pro Ihr Tyr Asn Pro Thr Leu 
5 10 15 
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OCT TAC TAC CAC OCC ATC OCG GGC GGG CTC AAC GTG CCA ATG TCT CTT 155 
Pro Tyr Tyr Gin Pro He Pro Gly Gly Leu Asn Val Gly Yet Ser VaJ 

20 25 30 

TAC ATC CAA GGA GTG GCC AGC GAG CAC ATG AAG OGC TTC TTC GTG AAC 203 
Tyr lie. Gin Gly Val Ala Ser Glu His Met Lys Arg Phe Phe Val Asn 

35 40 45 

TIT GTG GTT GGG CAG GAT CCG GGC TCA GAC GTC GCC TTC CAC TTC AAT 251 
Phe Val Val Gly Gin Asp Pro Gly Ser Asp Val Ala Phe His Phe Asn 
50 55 60 65 

CCG CGG TIT GAC GGC TUG GAC AAG GTG GTC TTC AAC ACG TTG CAG GGC 299 
Pro Arg Phe Asp Gly Trp Asp Lys Val Val Phe Asn Thr Leu Gin Gly 

70 75 80 

GGG AAG TGG GGC AGC GAG GAG AGG AAG AGG AGC ATG OCC TTC AAA AAG 347 
Gly Lys Trp Gly Ser Glu Glu Arg Lys Arg Ser Met Pro Phe Lys Lys 

85 90 95 

GGT GCC GCC TIT GAG CTG CTC TTC ATA GTC CTG OCT GAG CAC TAC AAG 395 
Gly -Ala Ala Phe Glu Leu Val Phe He Val Leu Ala Glu His Tyr Lys 

100 105 110 

GTG GTG GTA AAT GGA AAT CCC TTC TAT GAG TAC GGG CAC CGG CTT CCC 443 
Val Val Val Asn Gly Asn Pro Phe Tyr Glu Tyr Gly His Arg Leu Pro 

115 120 125 

CTA CAG ATC CTC ACC CAC CTG CAA GTG GAT GGC GAT CTG CAA CTT CAA 491 
Leu Gin Met Val Thr His Leu Gin Val Asp Gly Asp Leu Gin Leu Gin 
130 135 140 145 

TCA ATC AAC TIC ATC GGA GGC CAG CDC CTC CGG CCC CAC GGA CCC CCG 539 
Ser lie Asn Phe He Gly Gly Gin Pro Leu Arg Pro Gin Gly Pro Pro 

150 155 160 

ATG ATG CCA OCT TAC OCT GGT CCC GGA CAT TGC CAT CAA CAG CTG AAC 587 
Met Met Pro Pro Tyr Pro Gly Pro Gly His Cys His Gin Gin Leu Asn 

165 170 175 

ACC CTG CCC ACC ATG GAA GGA CCC OCA ACC TTC AAC CCC CCT GTG CCA 635 
Ser Leu Pro Thr Met Glu Cly Pro Pro Thr Phe Asn Pro Pro Val Pro 
180 185 190 
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TAT TTC GGG AGC CTG CAA GCA GGG CTC ACA GCT OGA AGA AOC ATC ATC 683 
Tyr Phe Cly Arg Leu Gin Gly Gly Leu Thr Ala Arg Arg Thr He lie 

195 200 205 

ATC AAG GGC TAT GTG OCT CCC ACA GGC AAG AGC TTT GCT ATC AAC TTC 731 
He Lys Gly Tyr Val Pro Pro Thr Gly Lys Ser Phe Ala lie Asn Phe 
210 215 220 225 

AAG GTG GGC TCC TCA GGG GAC ATA GCT CTG CAC AH AAT CCC CCC ATC 779 
is Lys Val Gly Ser Ser Gly Asp lie Ala Leu His He Asn Pro Arg Met 

230 235 240 

GGC AAC GGT ACC GTG CTC CGG AAC AGC CTT CTG AAT GGC TCG TGG CCA 827 
Gly Asn Gly Thr Val Val Arg Asn Ser Leu Leu Asn Gly Ser Trp Gly 

245 250 255 

TCC GAG GAG AAG AAG ATC ACC CAC AAC CCA TTT GGT CCC GGA CAG TTC 875 
Set Glu Glu Lys Lys lie Thr His Asn Pro Phe Gly Pro Gly Gin Phe 

260 265 270 

TTT GAT CTG TCC AH CCC TGT GGC TTG GAT CGC TTC AAG GTT TAC CCC 923 
Phe Asp Leu Ser He Arg Cys Gly Leu Asp Arg Phe Lys Val Tyr Ala 

275 280 285 

AAT GGC CAG CAC CTC TTT GAC ITT CCC CAT CGC CTC TCC CCC TTC CAG 971 
Asn Gly Gin His Leu Phe Asp Phe Ala His Arg Leu Ser Ala Phe Gin 
290 295 300 305 

m AGC GTG CAC ACA TTC GAA ATC CAG GGT CAT CTC ACC TTG TCC TAT CTC 1019 

Arg Val Asp Thr Leu Glu lie Gin Gly Asp Val Thr Leu Ser Tyr Val 
310 315 320 
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CAC ATC TAATCTATTC CTGGGGCCAT AACTCATGGG AAAACAGAAT TATCC 1070 
Gin He 

CCTAGCACTC CTTTCTAAGC CCCTAATAAA ATGTCTGACC CTG 1H3 



55 



13 



EP0 841 393 A1 

Claims 

1 . A protein comprising an amino acid sequence repres nted by Sequence No. 1 . 

2. A cDNA encoding an amino acid sequence represented by Sequence No. 1 . 

3. A cDNA claimed in Claim 2 comprising a base sequence represented by Sequence No. 2. 

4. A cDNA claimed in Claim 3 comprising a base sequence represented by Sequence No. 3. 
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